Multi-Label Multi-Kernel Transfer Learning for Human Protein Subcellular Localization

نویسنده

  • Suyu Mei
چکیده

Recent years have witnessed much progress in computational modelling for protein subcellular localization. However, the existing sequence-based predictive models demonstrate moderate or unsatisfactory performance, and the gene ontology (GO) based models may take the risk of performance overestimation for novel proteins. Furthermore, many human proteins have multiple subcellular locations, which renders the computational modelling more complicated. Up to the present, there are far few researches specialized for predicting the subcellular localization of human proteins that may reside in multiple cellular compartments. In this paper, we propose a multi-label multi-kernel transfer learning model for human protein subcellular localization (MLMK-TLM). MLMK-TLM proposes a multi-label confusion matrix, formally formulates three multi-labelling performance measures and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which to further extends our published work GO-TLM (gene ontology based transfer learning model for protein subcellular localization) and MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for multiplex human protein subcellular localization. With the advantages of proper homolog knowledge transfer, comprehensive survey of model performance for novel protein and multi-labelling capability, MLMK-TLM will gain more practical applicability. The experiments on human protein benchmark dataset show that MLMK-TLM significantly outperforms the baseline model and demonstrates good multi-labelling ability for novel human proteins. Some findings (predictions) are validated by the latest Swiss-Prot database. The software can be freely downloaded at http://soft.synu.edu.cn/upload/msy.rar.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HPSLPred: An Ensemble Multi-label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source

Predicting the subcellular localization of proteins is an important and challenging problem. Traditional experimental approaches are often expensive and time-consuming. Consequently, a growing number of research efforts employ a series of machine learning approaches to predict the subcellular location of proteins. There are two main challenges among the state-of-the-art prediction methods. Firs...

متن کامل

Human Protein Subcellular Localization with Integrated Source and Multi-label Ensemble Classifier

Predicting protein subcellular location is necessary for understanding cell function. Several machine learning methods have been developed for computational prediction of primary protein sequences because wet experiments are costly and time consuming. However, two problems still exist in state-of-the-art methods. First, several proteins appear in different subcellular structures simultaneously,...

متن کامل

An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues

MOTIVATION Human cells are organized into compartments of different biochemical cellular processes. Having proteins appear at the right time to the correct locations in the cellular compartments is required to conduct their functions in normal cells, whereas mislocalization of proteins can result in pathological diseases, including cancer. RESULTS To reveal the cancer-related protein mislocal...

متن کامل

A comparison of computational strategies for multi-label prediction of protein subcellular localizations

The subcellular localization of a protein is closely correlated with its functions. Although many machine learning algorithms have been developed and applied to predict protein compartments using different data sources, such as protein amino acid sequence and motif information, automatic prediction of subcellular localization remains a challenging problem. In this study, we compared three suppo...

متن کامل

Prediction of Protein Subcellular Multi-locations with a Min-Max Modular Support Vector Machine

How to predict subcellular multi-locations of proteins with machine learning techniques is a challenging problem in computational biology community. Regarding the protein multi-location problem as a multi-label pattern classification problem, we propose a new predicting method for dealing with the protein subcellular localization problem in this paper. Two key points of the proposed method are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2012